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Abstract 

We show in some detail how to implement Shor's efficient quantum 
algorithm for discrete logarithms for the particular case of elliptic curve 
£!D \ groups. It turns out that for this problem a smaller quantum computer 

can solve problems further beyond current computing than for integer 
' factorisation. A 160 bit elliptic curve cryptographic key could be broken 

on a quantum computer using around 1000 qubits while factoring the 
security-wise equivalent 1024 bit RSA modulus would require about 2000 
Sh ' qubits. In this paper we only consider elliptic curves over GF(p) and not 

yet the equally important ones over GF(2 n ) or other finite fields. The main 
technical difficulty is to implement Euclid's gcd algorithm to compute 
multiplicative inverses modulo p. As the runtime of Euclid's algorithm 
depends on the input, one difficulty encountered is the "quantum halting 
problem" . 
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1 Introduction 

In 1994 Peter Shor presented two efficient quantum algorithms £Q for compu- 
tational problems for which no polynomial time classical algorithms are known. 
One problem is to decompose a (large) integer into its prime factors. The 
other problem, which we consider here, is finding discrete logarithms over finite 
groups. The classical complexity of this problem seems to depend strongly on 
the underlying group. A case for which (known) classical algorithms are par- 
ticularly inefficient are elliptic curve groups defined over finite fields. Actually 
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most public key cryptography in use today relies either on the presumed hard- 
ness of integer factoring (RSA) or that of discrete logarithms over finite fields 
or elliptic curves. 

Elliptic curve cryptography (ECC) is sometimes preferred because it allows 
shorter key sizes than RSA. This is because the best classical integer factoring 
algorithms (the number field sieve, see e.g. PJ), although superpolynomial, have 
less than exponential complexities. Very roug hly the complexity is O(e clog n ), 
where n is the integer to be factored. On the other hand, for discrete logarithms 
over elliptic curves, nothing better than "generic" algorithms are known, thus 
algorithms which work for any group. These algorithms, e.g. the Pollard p 
algorithm j^j, have truly exponential complexity. 

Shor's quantum algorithms for integer factoring and discrete logarithms have 
about equal complexity, namely typically 0(n 3 ). Thus there is a larger complex- 
ity gap between classical and quantum for discrete logarithms than for factoring. 

Proposals have been made 0J |S] for optimised implementations of the quan- 
tum factoring algorithm, in particular for minimising the number of qubits 
needed. The best current result by S.Beauregard |3] is that about 2n qubits 
are enough. We attempt here a similar optimisation for discrete logarithms 
over elliptic curves. The implementation is more difficult, but we still get an 
algorithm that uses less qubits and time to solve a problem of similar classical 
difficulty when compared to factoring. For problems that can now barely be 
solved, the number of qubits is not much less than for factoring, but in the 
future, with more powerful classical computers, the gap will increase. 

Elliptic curves used in cryptography [El 13 El are defined either over the field 
of arithmetic modulo a prime, thus GF(p), or over GF(2 n ). For our implemen- 
tation we need to do arithmetic operations in these fields, in particular we must 
compute multiplicative inverses. For GF(p), this is done with Euclid's algo- 
rithm for computing the greatest common divisor (gcd), or rather the extended 
version of this algorithm. This algorithm can be adapted to the case of any 
finite field GF(p n ), but for n > 1 there is the added concern of deciding how 
the elements of the field will be represented. So in this paper we only consider 
elliptic curves over GF{p). 

Still, the implementation of the extended Euclidean algorithm is the main 
technical difficulty we encounter. Fortunately, the algorithm can be made piece- 
wise reversible, so that not too much "garbage" has to be accumulated. As for 
the factoring algorithm, it is possible to run the whole algorithm with 0{n) 
qubits. For our implementation of Euclid's algorithm to achieve the classical 
time complexity of 0(n 2 ), it is necessary to terminate the steps in the algo- 
rithm at different points, depending on the input. This is difficult to achieve 
with acyclic circuits (which are necessary for computations in "quantum par- 
allelism"). We will relegate some of the more cumbersome technical aspects of 
our solution to an appendix, and will also discuss possible other approaches. 

In trying to optimise our implementation, we were guided by practical con- 
siderations, although to do this, one would really have to know how an actual 
quantum computer will look. We put most emphasis on minimising the number 
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of qubits, but also on the total number of gates. We assume that whatever can 
be computed classically, should be done classically as long as it doesn't take 
an unreasonable amount of computation. Basically we are trying to optimise a 
quantum circuit where a gate can act on any pair of qubits, but it turns out 
that most gates are between neighbours like in a cellular automaton. In con- 
trast to the earlier papers 01 |5| optimising the quantum factoring algorithm, 
we have not thought about parallelising the algorithm, although this may well 
be of interest for actual implementations. 

2 Review of the quantum algorithm for discrete 
logarithms 

2.1 The discrete logarithm problem (DLP) 

Let G be a finite cyclic group and let a be a generator of G. The discrete 
logarithm problem over G to the base a is defined as given an clement (3 G G 
determine the unique d £ [0, \ G\ — 1] such that a d = (3. The integer d is denoted 
by log a j3. Note that while G may be a subgroup of a non-abelian group, G 
being cyclic is always an abelian group. Usually it is assumed that the order of 
G is known. 

There are two general types of algorithms for solving DLPs. The first type, 
called generic algorithms, work for any group, as long as we have a (unique) 
representation of group elements and we know how to carry out the group 
operation. The best known classical generic algorithms have complexity equal 
to about the square root of the order of the group. Thus they are exponential 
in the number of bits necessary to describe the problem. 

The second type of algorithms are the algorithms which rely on specific 
properties of the group or its representation. As shown in the examples be- 
low, some groups have group specific algorithms which can solve the DLP in 
subexponential or even polynomial time. 

2.1.1 Examples (Z N and Z*) 

Let N be a positive integer and consider the case when G = 2jn the additive 
group of integers modulo N . Here the generators of the group are precisely the 
a e G such that gcd(a, N) — 1 and the equation d ■ a = (3 (mod N) can be 
solved by finding the multiplicative inverse of a modulo N with the extended 
Euclidean algorithm. Thus for this group the DLP can be solved in polynomial 
time (O(log 2 2 A0). 

There are however groups for which the DLP is not so easy. Suppose that 
G = Z* the multiplicative group modulo p, which is cyclic, and that a is a 
generator of G. Then the DLP is equivalent to solving the equation a d = (3 
(mod p) . There are no known classical algorithms which can solve this problem 
in polynomial time. Still, like for integer factoring, the best algorithms have a 
subexponential complexity. 
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Note that if G is a finite cyclic group of order N then G is isomorphic to 
Zat in which the DLP is easy. Thus it is not the structure of a group, but its 
representation, which can make its DLP difficult. 

2.1.2 Discrete logarithms over elliptic curves 

Elliptic curves over GF(p n ) are finite abelian groups. Given a point a on an 
elliptic curve we can consider the difficulty of solving the DLP in the cyclic 
subgroup generated by a. For general elliptic curves (trace not equal to zero or 
one) the DLP seems to be computationally quite hard. In particular for these 
curves it is not known how to exploit the representation of the group to help 
solve the DLP. Thus the best known classical algorithms for the DLP on these 
elliptic curves are the generic algorithms whose running times are exponential in 
the number of bits necessary to describe the problem. This presumed classical 
hardness makes the groups useful for cryptography and has led to systems based 
on these group being included in ANSI, IEEE and FIPS standards [E10IE1- 

2.2 Shor's quantum algorithms 

Both of Shor's algorithms have later been understood as special cases of a more 
general framework, namely the abelian hidden subgroup problem (see e.g. pi llUI 
lll|). While in the factoring algorithm we are looking at subgroups of the group 
of integers, Z, in the discrete logarithm case, subgroups of Z 2 play a role. In 
particular we are looking at sublattices of the lattice Z 2 , thus elements which can 
be written as integer linear combinations of two (linearly independent) vectors 
in Z 2 . Thus in a way the discrete logarithm algorithm can be viewed as a 2 
dimensional version of the factoring algorithm. 

2.2.1 The order finding algorithm (factoring) 

The basis of the integer factoring algorithm is really an order finding algorithm, 
which we briefly review here. We are given an element a in a (finite) group G and 
want to find its order. That is, the smallest non-negative integer r with a r = e, 
where e is the neutral element. To do this, we prepare a (large) superposition of 
N "computational" basis states \x) and compute a x in "quantum parallelism": 



Where N is much larger than any order r that we expect. Now imagine that 
we measure the second register and get a x ° (the measurement is not actually 
necessary) . Then the first register will be left in a superposition of the form 




N-l 



N-l 
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where xo is a random number between and r — 1. Now a quantum Fourier 
transform (of size N) will leave this register in a superposition dominated by 
basis states that are close to multiples of N/r. Thus a measurement will yield 
such a state with high probability. If N is chosen larger than the square of any 
expected order r, it is possible to calculate r from the observed state with high 
probability. Also N is chosen to be a power of 2, as this gives the simplest 
"quantum fast Fourier transform" (QFFT). 

2.2.2 Assumption for discrete log: order is prime (and known) 

First let us justify a simplifying assumption that we make. We assume that the 
order of the base a of the elliptic curve discrete logarithm is prime and that we 
know this prime. This is true for the cases standardised for cryptographic use 
|ol[7||8|. Also, if we don't know the order of a, we can find it with the above 
order finding algorithm and also decompose it into its prime factors with the 
integer factoring algorithm. Then there is a standard way to reduce the DLP 
in a group with composite order, N, into several DLPs with orders equal to the 
prime factors of N (see |12|h Thus our simplifying assumption is really without 
loss of generality. 

2.2.3 The discrete logarithm algorithm 

So we have a q = e, with q prime and (3 = a d where d is unknown and between 
and q — 1. Consider the function f(x,y) = a x j3 v for integers x and y. This 
function has two independent "periods" in the plane Z 2 , namely 

f{x + q,y) = f(x,y) and f(x + d,y-l) = f(x,y) 

Thus all x, y with f(x,y) = e define a sublattice of Z 2 . The 2 dimensional 
Fourier transform then leads to the dual lattice from which d can be determined. 
Note that f{x,y) can be thought of as being defined over Z g 2 as f(x,y) — 
f(x mod q,y mod q). 

For didactic purposes let us first imagine that we knew a way to carry out 
the quantum fast Fourier transform of order q (QFFT g ) , as then the algorithm 
would be particularly nice. (Actually it has been shown how to do this approx- 
imatively ^2 EH but we won't use these constructions.) Then we start with 
the following state of two quantum registers, and compute a x (3 V in "quantum 
parallelism" : 



Again, imagine that we now measure the last register (although this is again not 
necessary). Then we will obtain a random clement a x ° of the group generated 
by a, where xo is between and q — 1. We will then find the first two registers 
in a superposition of all x, y with 




a x (3 y = 




= a 
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Because the order of a is q, this is equivalent to 



x + dy = x (mod q) 

Or equivalently x — (xq — dy) mod q. Thus for each y there is exactly one 
solution, and so the state of the first two registers is: 

1 9-1 

Now we Fourier transform each of the two registers with our (hypothetical) 
quantum Fourier transform of order q, which acts on basis states as 

1 2=J 

\ Z ) -» — J2 \ Z ') Where W 9 = e2mh 

z'=0 



We obtain 



»' ' o:',i/'=0 y=0 

The sum over y is easy to calculate. It gives qu*° x if y' = dx'(mod q) and 
vanishes otherwise. Thus we get: 

1 9-1 

— E wJ°*V > y / = dx'mod q) 

We now see that the probability of measuring a basis state is independent of xq , 
thus it doesn't matter which xq we measured above. By measuring, we obtain 
a pair x',y' from which we can calculate d = y'(x')~ 1 mod q as long as i' ^ 0. 
(The only disadvantage of allowing the order q not to be a prime, would be that 
we would require gcd(x', q) = 1.) 



2.2.4 Using a Fourier transform of order 2™ instead of q 

In practise we will want to replace each of the two QFFT g 's with a quantum 
fast Fourier transform of order 2™ (QFFT2«), because this is easy to implement. 
For the QFFT g above we will always obtain a pair x', y' with y 1 = (ix'mod q in 
the final measurement. However, for the QFFT2" we will get a high probability 
of measuring a pair x' , y' if 

(x'q/2 n , y'q/2 n ) » (fc, dfc) for some fc 

For 2" w 5 we have a good (constant) probability of getting the right values in 
Z 9 2 by rounding. In appendix [S] we make this analysis in detail and show that 
by investing a reasonable amount of classical post-processing we can with prob- 
ability close to 1 obtain the discrete logarithm with a single run of the quantum 
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algorithm. (Of course this is on a perfect, noise free, quantum computer...) 
More classical post-processing increases the chances of success because now we 
can try out several values in the vicinity of the values x', y' which we measured. 
Also there is a tradeoff between n, thus the number of qubits, and the success 
probability. For n increasing beyond log 2 q the probability of failure decreases 
exponentially. 



3 Elliptic curves 

As mentioned earlier, elliptic curves over finite fields form abelian groups. We 
will now present a brief introduction to elliptic curves over fields of characteristic 
not equal to 2 or 3 (i.e. 1 + f ^ and 1 + f + f ^ 0). For a more in depth 
introduction to elliptic curves and their use in cryptography see ^] El DUl • 

Let if be a field of characteristic not equal to 2 or 3. An elliptic curve over 
K is the set of solutions (x, y) G K x K to the equation 

E : y 2 = x 3 + ax + b (1) 

where a,b G K are constants such that 4a 3 + 27b 2 ^ 0, together with the point 
at infinity, which is denoted O. The solutions to equation ^ are called the finite 
points on the curve and together with O are called the points on the curve. We 
will use E to denote the set of points on an elliptic curve. 

The group operation on the points is written additively and is defined as 
follows. If P G E then P + = O + V = V. If P = (x u yi),R = (x 2 ,y 2 ) G E 
then 

P + R= i° if faaiJ/a) = Oi,-yi), ^ 

\ (^3,2/3) otherwise, 

where x 3 = A 2 - (xi + x 2 ), y 3 = H x i - £3) - yi, 

x= Uy2-yi)/{x2-x 1 ) HP^R 
\(3x 2 + a)/(2 yi ) i£P = R 

and all operations are performed over the field K. 

It is not hard to check that if (xi, yi) and (x 2 , 2/2) are on the curve, then so 
is (x3, 2/3) and thus the above operation is closed on E. While not immediately 
clear, the points on the curve together with the above operation form an abelian 
group (For a proof see |14j). It is clear from the definition of the group operation 
that O is the identity element. If P — (x.y) G E it following directly from 
equation ^ that R = (x, —y) is also a point on the curve. Thus if P — (x,y) 
then the inverse of P is (x, — y). Note that the elliptic curve group operation is 
defined differently over fields of characteristic 2 or 3. 

A famous theorem by Hasse states that if E is an elliptic curve defined over 
GF(p) then the number of points on E, denoted #E, is p+l—t where \t\ < 
This implies that the maximum bit size of the order of a point is approximately 
the bit size of p. 
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A particular elliptic curve, E, is specified by giving the base field, K, and 
the constants a,b 6 K from equation ^ Fot our purposes the base field K will 
always be GF{p) for some prime p > 3. In practice a and b are selected such 
that the order of the group contains a large prime factor, q, as this is necessary 
to make the discrete logarithm problem hard. For simplicity we shall assume 
that p and q are approximately the same bit size. 

3.1 Representing points on an elliptic curve 

Suppose we are given an elliptic curve, E : y 2 — x 3 +ax + b, over the field GF{p) 
for some prime p > 3. In order for a quantum computer to calculate discrete 
logarithms over E we like to have a unique representation of the points on E. 

If P is a finite point on E then P = (x,y), where x and y are integers 
modulo p. Thus any finite point can be represented by a unique ordered pair 
(x, y) with x, y S {0, 1, . . . ,p — 1}. Now all that remains is to determine how 
O will be represented. As will be discussed in section T4. 21 our implementation 
we will not actually require a representation of O. However, if a representation 
was required we could simply pick an ordered pair (x, y) which is not on the 
curve. For example, (p,p) could be used to represent O for any curve, while 
(0, 0) could be used for any curve with 6^0. 

4 Our implementation of the quantum algorithm 
for discrete logarithms over elliptic curves 

We consider an elliptic curve, E, over GF(p), where p is a large prime. The 
base of the logarithm is a point P S E whose order is another (large) prime 
q, thus qP — O. We want to compute the discrete logarithm, d, of another 
point Q £ E, thus Q — dP. (Remember that we use additive notation for the 
group operation, thus instead of a power of the base element, we have an integer 
multiple.) 

As discussed in section \2. 2 .31 we need to apply the following transformation 

2»_X2™-1 2 n -12"-l 
x— y=0 x—0 y—0 

Thus we need a method of computing (large) integer multiples of group elements. 
This can be done by the standard "double and add technique". This is the 
same technique used for the modular exponentiation in the factoring algorithm, 
although there the group is written multiplicatively so it's called the square and 
multiply technique. To compute xP + yQ, first we repeatedly double the group 
elements P and Q, thus getting the multiples Pi = 2 l P and Qi — 2 % Q. We then 
add together the Pi and Qi for which the corresponding bits of x and y are 1, 
thus 

xP + yQ = x i p i + X! Vi< ^ 1 

i i 
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where x = £^2% V = T,iVi 2 \ p i = 2 ' p and Qi = 2 *Q- Thc multiples P t 
and Qi can fortunately be precomputed classically. Then to perform the above 
transformation, start with the state £ x \x, y, O) . The third register is called 
the "accumulator" register and is initialised with the neutral element O. Then 
we add the Pi and Qi to this register, conditioned on the corresponding bits of 
x and y. 

4.1 Input registers can be eliminated 

Here we show that the input registers, \x, y), can actually be shrunk to a single 
qubit, thus saving much space. This is accomplished by using the semiclassi- 
cal quantum Fourier transform and is completely analogous to what has been 
proposed for the factoring algorithm ^7] (see also e.g. 010). 

Griffiths and Niu ^H] have observed that the QFFT followed by a mea- 
surement can be simplified. Actually it can be described as simply measuring 
each qubit in an appropriate basis, whereby the basis depends on the previous 
measurement results. (In accordance with quantum computing orthodoxy, we 
can also say that before measuring the qubit in the standard basis, we apply 
a unitary transformation which depends on the previous measurement results.) 
Note that in the initial state ^ ELos=o _1 \ x > V' ®) * ne qubits in the x- and 
y-registers are actually unentangled. Each qubits is in the state (|0) + |l))/\/2- 
Now we can see how these two registers can be eliminated: We do n steps. In 
step number i we first prepare a qubit in the state (|0) + |l))/v2, then use it 
to control the addition of Pi (or Qi) and finally we measure the control qubit 
according to the semiclassical QFFT. In this QFFT the qubits have to be mea- 
sured in reversed order, thus from highest significance to lowest. Thus we will 
need to proceed from the i = n — 1 step down to thc i = step, but this is no 
problem. 

In summary, we really only need the accumulator register. We are left being 
required to carry out a number of steps whereby we add a fixed (classically 
known) point Pi (or Qi) to a superposition of points. We are working in the 
cyclic group generated by P, thus the effect of a fixed addition is to "shift" the 
discrete logarithm of each element in the superposition by the same amount. 
For this reason we shall refer to these additions of fixed classical points as 
"group shifts" . (That the group shifts are conditional on a qubit makes it only 
insignificantly more difficult, as we will point out later.) Thus wc need unitary 
transformations Up i and Uq 1 which acts on any basis state \S) representing a 
point on the elliptic curve, as: 

U Pi : \S) - \S + P l ) and U Qi : \S) -> |5 + Qi) 

As explained in section 12.2.31 and appendix EI it is sufficient to do n of these 
steps for P and n for Q, thus a total of 2n, where n w log 2 q. 
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4.2 Simplifying the addition rule 

So we have already managed to decompose the discrete logarithm quantum 
algorithm into a sequence of group shifts by constant classically known elements. 
That is 

U A ■ \S) -> \S + A) S,AeE and A is fixed 

We propose to only use the addition formula for the "generic" case (i.e. for 
P + R where P,R^O and P ^ ±R) for the group operation, although it 
wouldn't be very costly to properly distinguish the various cases. Still, it's not 
necessary. First note that the constant group shifts 2 l P and 2 l Q are not equal 
to the neutral element O, because P and Q have order a large prime. (If a group 
shift was 0, we would of course simply do nothing.) Still, we are left with three 
problems. First, that a basis state in the superposition may be the inverse of 
the group shift. Second, that a basis state in the superposition may equal the 
group shift. Lastly, that a basis state in the superposition may be O. We argue 
that with a small modification these will only happen to a small fraction of the 
superposition and thus the fidelity lost is negligible. 

To ensure a uniformly small fidelity loss, we propose the following modifica- 
tion at the beginning of the DLP algorithm: choose (uniformly) at random an 
element k ■ P O in the group generated by P. Then we initialise the accu- 
mulator register in the state \k ■ P), instead of \0). This overall group shift is 
irrelevant, as after the final QFFT it only affects the phases of the basis states. 
Now on average in each group shift step we "loose" only a fraction of 1/q of the 
superposition by not properly adding inverses of points and an equal amount 
for not correctly doubling points. Thus the total expected loss of fidelity from 
this error during the 2n group shifts is An/q sa 41og 2 q/q and is thus an expo- 
nentially small amount. As the accumulator no longer begins in the state \0), 
the superposition \S) to which Ua will be applied can only contain \0) if an 
inverse was (correctly) added in the previous addition. Thus O being a basis 
state in the superposition will not cause any further loss of fidelity. 

4.3 Decomposition of the group shift 

The group shift is clearly reversible. A standard procedure might be to do 
15,0) — * \S, S + A) — > |0, S + A) where in the last step we would uncomputc 
S by running the addition of —A to S + A backwards. Fortunately we can do 
better than this generic technique. In terms of the coordinates of the points, 
the group shift is: 

\S) = \(x,y)) -> \S + A) = \(x,y) + (a,P)) = \(x',y')) 

Recall that x = a if and only if (x, y) — ±A and that this portion of the 
superposition can be lost (see section . Thus we use the following group 
operation formulas (see eq. |2J): 

y-0 y' + f3 , 2 

A = = x = A — (x + a) 

x — a x' — a 
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The second expression for A is not difficult to obtain. It will allow us to later 
uncompute A in terms of x',y'. Actually, when computing A from x, y we can 
directly uncompute y, and similarly we can get y' when uncomputing A: 

x, y <-> x, A <-> x , A <-> x ,y 

Where a double-sided arrow (^) indicates that we need to do these operations 
reversibly, thus in each step we need also to know how to go backward. Note 
that the decomposition of one large reversible step into several smaller individ- 
ually reversible ones, is nice because it saves space, as any "garbage" can be 
uncomputed in each small step. In more detail the sequence of operations is: 

x,y <-» x — a,y — f3 x — a, A = — — — «-> (3) 

x — a 

i \ y' + P i i, a ' ' 

«-> x — a, A = ; <-» x — a,y + p «-> x ,y 

x — a 

where all the operations are done over GF(p). The second line is essentially 
doing the operations of the first line in reverse. The first and last steps are just 
modular additions of the constants ±a, —0. They clearly need much less time 
(and also less qubits) than the multiplications and divisions (see so we will 
ignore them when calculating the running times. The operation in the middle 
essentially involves adding the square of A to the first register. This operation, 
too, is relatively harmless. It uses less "work" qubits than other operations and 
thus doesn't determine the total number of qubits needed for the algorithm. 
Still, for time complexity we have to count it as a modular multiplication (more 
about this below). So a group shift requires two divisions, a multiplication and 
a few additions/subtractions. 

4.3.1 Divisions of the form x,y <-> x,y/x 

The remaining two operations are a division and multiplication where one of the 
operands is uncomputed in the process. The division is of the form x, y «-> x,y/x, 
where x ^ 0. (different x and y than the last section!). The multiplication in 
is simply the division run in the reverse direction. We further decompose 
the division into four reversible steps: 

x,y & l/x,y S l/x,y,y/x £ x,y,y/x ™ x,0,y/x 

Where the letters over the arrows are m for "multiplication" and E for "Euclid's 
algorithm" for computing the multiplicative inverse modulo p. The second m is 
really a multiplication run backwards to uncompute y. 

4.3.2 Modular multiplication of two "quantum" numbers 

Before concentrating on Euclid's algorithm, let's look at the modular multipli- 
cations of the form x,y <-> x,y,x ■ y. In the quantum factoring algorithm the 
modular exponentiation is decomposed into modular multiplications. But there 
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one factor is a fixed "classical" number. Still, the situation when we want to 
act on superpositions of both factors, is not much worse. So we want to do 
(explicitly writing mod p for clarity) : 

\x,y) -> \x,y,x ■ y mod p) 

We now decompose this into a sequence of modular additions and modular 
doublings: 

n-l 

x-y = Xi2 l y = x y + 2(xiy + 2(x 2 y + 2(x 3 y + ...))) ( mod P) 

So we do a series of the following operations on the third register: 

A <-> 2A <-> 2A + Xiy (mod p) z = n — 1...Q 

Modular doubling 

The modular doubling is a standard doubling (a left shift by one bit) followed by 
a reduction mod p. Thereby we either subtract p or don't do anything. Whether 
we subtract p has to be controlled by a control qubit. At the end this control 
qubit can be uncomputed simply by checking whether 2A mod p is even or odd 
(because p is odd). For the addition or subtraction of a fixed number like p we 
need n carry qubits, which have to be uncomputed by essentially running the 
addition backwards (but not undoing everything!). To do the reduction mod p 
we will now in any case subtract p, check whether the result is negative, and 
depending on that, either only uncompute the carry bits or undo the whole 
subtraction. In the end the operation is only slightly more complicated than 
the addition of a fixed number. 

Modular addition 

The second step is a modular addition of the form \x, y) — » \x, x + y mod p). 
Again we first make a regular addition. This is only slightly more complicated 
than the addition of a fixed number (see e.g. [5] pp. 7,8). Then, again, we 
either subtract p or not. To later uncompute the control bit which controlled 
this, we have to compare x and x + y mod p, which essentially amounts to 
another addition. Thus overall we have two additions. 

So all together for the modular multiplication we have to do n steps, each 
roughly consisting of 3 additions. So one multiplication involves some 3n addi- 
tions. 

5 The Extended Euclidean Algorithm 

Suppose A and B are two positive integers. The well known Euclidean algo- 
rithm can be used to find the greatest common divisor of A and B, denoted 
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gcd(A, B). The basis of the algorithm is the simple fact that if q is any integer 
then gcd(A, B) = gcd{A, B — qA). This implies the gcd doesn't change if we 
subtract a multiple of the smaller number from the larger number. Thus the 
larger number can be replaced by its value modulo the smaller number without 
affecting the gcd. Given A and B with A> B this replacement can be accom- 
plished by calculating q = [A/B\ and replacing A by A — qB, where [^J is 
the largest integer less than or equal to x. The standard Euclidean algorithm 
repeats this replacement until one of the two numbers becomes zero at which 
point the other number is gcd{A, B). The table below illustrates the Euclidean 
algorithm. 



gcd{A,B) 




gcd(1085,378) 


integers 


quotient 




integers 


quotient 


(A,B) 


q = [A/B\ 




(1085,378) 


2 = [1085/378J 


(A — qB, B) 


<?' = Ia^b\ 




(329,378) 


1 = [378/329J 


(A — qB, B — q'(A — qB)) 


q" = ... 




(329,49) 


6 = L329/49J 








(35,49) 


1 = [49/35J 








(35,14) 


2 = L35/14J 








(7,14) 


2 = [14/7J 


(gcd(AB),0) 






(7,0) 





It can be shown that the Euclidean algorithm will involve 0(n) iterations (mod- 
ular reduction steps) and has a running time of 0(n 2 ) bit operations, where n 
is the bit size of A and B (see [201 ) - 

Again suppose that A and B are two positive integers. The extended Eu- 
clidean algorithm can be used not only to find gcd(^4, B) but also integers k 
and k' such that kA + k'B — gcd(A, B). This follows from the fact that after 
each iteration of the Euclidean algorithm the two integers are known integer 
linear combinations of the previous two integers. This implies that the integers 
are always integer linear combinations of A and B. The extended Euclidean 
algorithm simply records the integer linear combinations of A and B which 
yield the current pair of integers. Thus when the algorithm terminates with 
(gcd(A, B),0) or (0, gcd(A, B)) we will have an integer linear combination of A 
and B which equals gcd(^4, B). 

Let us now turn our attention to finding x^ 1 (mod p), for x ^ 0. If the 
extended Euclidean algorithm is used to find integers k and k' such that kx + 
k'p = 1 then k = x _1 (mod p). Note that we are not interested in the coefficient 
k 1 oip in the integer linear combination. Thus we need only record the coefficient 
of x (and not p) in the extended Euclidean algorithm. 

Hence to compute x~ x (mod p) we will maintain two ordered pairs (a, A) 
and (6, B), where A and B are as in the Euclidean algorithm and a and b record 
the coefficients of x in the integer linear combinations. We shall refer to these 
ordered pairs as Euclidean pairs. (Note that A and B will equal ax (mod p) and 
bx (mod p)). We begin the algorithm with (a, A) — (0,p) and (b,B) = (l,x). 
In each iteration we replace either (a, A) or (b,B). If A > B then we replace 
(a, A) with (a — qb, A — qB), where q = [A/B\. Otherwise (b, B) is replaced 
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with (b — qa, B — qA), where q = [B/A\ . The algorithm terminates when one of 
the pairs is (±p, 0), in which case the other pair will be (a: -1 , 1) or (x^ 1 — p, 1). 
We illustrate the algorithm in the following table. 



x 1 mod p 




96" 1 mod 257 


Euclidean pairs 


quotient 




Euclidean pairs 


quotient 


(0,p),(l,x) 


q = [p/x\ 




(0,257), (1,96) 


2 = [257/96J 


{-Q,P-Qx), {l,x) 


q' = 1 x 1 




(-2, 65), (1,96) 


1 = [96/65J 


(~<1,P - qx), (1 + q'q,x- q'{p - qx)) 




(-2, 65), (3, 31) 


2 = [65/31J 








(-8, 3), (3, 31) 


10 = L31/3J 








(-8, 3), (83,1) 


3 = L3/1J 


(-p,0),(x-\l) 






(-257,0), (83,1) 





Note that at termination the Euclidean pairs will either be (— p, 0), (x^ 1 , 1) or 
(x _1 — p, 1), (p, 0). In the later case we have to add p to x^ 1 — p to get the 
standard representation. 

5.1 Stepwise reversibility 

A priori it's not clear whether an iteration of the extended Euclidean algorithm 
is reversible. In particular it's not clear whether the quotients q will need 
to be stored or if they can be uncomputed. If they need to be stored then 
this will constitute a considerable number of "garbage" bits which could only 
be uncomputed (in the usual way) once the whole inverse finding algorithm 
had finished. Fortunately it turns out that each iteration of the algorithm is 
individually reversible. 

Concretely let's look at uncomputing the quotient q after an iteration which 
transformed (a, A), (b, B) into (a — qb,A — qB), (b, B). We know that A > B 
and q = [A/B\. It is not hard to see that a and b will never have the same sign 
and that A > B if and only if \a\ < \b\. Therefore L-^^J = 9- Thus wc sec, 
that while q is computed from the second components of the original Euclidean 
pairs, it can be uncomputed from the first components of the modified Euclidean 
pairs. 

5.2 Simple implementations with time 0(n 3 ) and 0(n 2 log 2 n) 

While it is a relief that the extended Euclidean algorithm is piecewise reversible, 
we are not at the end of our labours. Note that the number of iterations (modu- 
lar reductions) in the algorithm for x^ 1 mod p depends on x in an unpredictable 
way. This is a problem because we want to apply this algorithm to a superposi- 
tion of many it's. Still worse is, that even the individual iterations take different 
times for different inputs x when the algorithm is implemented in an efficient 
way. Namely, the quotients q tend to be small, and we want to use algorithms 
in each iteration which exploit this fact, since for small q the steps can be made 
faster. Only then does the extended Euclidean algorithm use time bounded by 
0(n 2 ). 
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Suppose that in each iteration of the algorithm we use full sized divisions 
and multiplications, thus the ones which we would use if we expected full sized 
n bit numbers. These algorithms (e.g. the modular multiplication described in 
section B. 3 .20 consist of a fixed sequence of 0(n 2 ) gates and can work just as 
well on a superposition of inputs. As there are 0(n) iterations, the extended 
Euclidean algorithm would then use 0(n 3 ) gates. 

5.2.1 Using bounded divisions 

The running time 0(n 3 ) can be improved by noting that large quotients q are 
very rare. Actually in a certain limit the probability for the quotient to be qo 
or more, is given by P(q > qo) = log 2 (l + l/qo) ~ l/(<7o m 2) (see e.g. Vol. 
2, section 4.5.3). If we use an algorithm that works for all quotients with less 
than, say, 3 log 2 n bits, then the probability of error per iteration will be w 1 /n 3 . 
Or, if acting on a large superposition, this will be the fidelity loss. Because in 
the whole discrete logarithm algorithm we have 0{n 2 ) such iterations {0{n) 
iterations for each of the 0(n) group shifts), the overall fidelity loss will only be 
of order 0(l/n). Still, even with these bounded divisions the overall complexity 
of the extended Euclidean algorithm would be 0(n 2 log 2 n). 

We would like to obtain a running time of 0{n 2 ), which would lead to an 
0(n 3 ) discrete logarithm algorithm. Our proposed implementation of the ex- 
tended Euclidean algorithm attains this 0(n 2 ) running time. Our implemen- 
tation is not only faster asymptotically, but also for the sizes n of interest, 
although only by a factor of 2 to 3. 

5.3 Our proposed implementation 

We have investigated various efficient implementations of the extended Eu- 
clidean algorithm. Fortunately, the one presented here is one of the simpler 
ones. To get an 0(n 2 ) algorithm, we will not require all the basis states in the 
superposition to go through the iterations of Euclid's algorithm synchronously. 
Rather we will allow the computation for each basis state to proceed at its own 
pace. Thus at a given time, one computation may be in the course of the 10- 
th iteration, while another one is still in the 7-th iteration. Later the second 
computation may again overtake the first one. 

The basic observation of our implementation is that it consists of only five 
different operations, most of which are essentially additions and subtractions. 
Thus each of the many "quantum-parallel" computations (thus each basis state) 
can store in a few flag bits which one of these five operations it needs. The 
implementation can then simply repeatedly cycle through the five operations 
one after the other, each one conditioned on the flag qubits. Thus as each 
operation is applied only those basis states which require it will be affected. 
For each cycle through the five operations the flag bits of a given computation 
will often only allow one operation to be applied to the computation. Therefore 
we loose a factor of somewhat less than five in speed relative to a (reversible) 
classical implementation. 
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5.3.1 Desynchronising the parallel computations 

Let us first explain in more detail the general approach to desynchronising the 
"quantum-parallel" computations. Suppose, for example, that there are only 
three possible (reversible) operations 0\ , o 2 and o 3 in a computation. Suppose 
further that each computation consists of a series of oi's then 02's, 03's and so 
on cyclicly. E.g. we would like to apply the following sequence of operations to 
two different basis state: 

...0 2 2 02 0\ O3O3 O2 0\0\0\0\ \x) 

...o 2 o\0\0\ o 3 02O2O2O2 01 \x') 

Clearly there must be a way for the computation to tell when a series of c^'s 
is finished and the next one should begin. But because we want to do this 
reversibly, there must also be a way to tell that an Oi is the first in a series. 
Say we include in each Oj a sequence of gates which flips a flag qubit / if Oi is 
the first in a sequence and another mechanism that flips it if Oi is the last in a 
sequence. (If there is a single Oi in a sequence, and thus Oi is both the first and 
the last Oi, then / is flipped twice.) 

We will also make use of a small control register c to record which operation 
should be applied to a given basis state. Thus we have a triple x, /, c where 
x stands for the actual data. We initialise both / and c to 1 to signify that 
the first operation will be the first of a series of o\ operations. The physical 
quantum-gate sequence which we apply to the quantum computer is: 

. . .ac o\ ac o' 3 ac o' 2 ac o[ ac o 3 ac o' 2 ac o[ \QC) 

Where the o- are the Oi conditioned on i = c and ac stands for "advance 
counter". These operations act as follows on the triple: 

o- : if i = c : x,f,c <-> Oi(x),f © first © last, c 
ac : x, /, c <-» x, f, (c + /) mod 3 

Where o\ doesn't do anything if i ^ c, © means XOR and (c + /) mod 3 is 
taken from {1, 2,3}. In the middle of a sequence of c^'s the flag / is and so 
the counter doesn't advance. The last oi in a series of Oi's will set / = 1 and 
thus the counter is advanced in the next ac step. Then the first operation of 
the next series resets / to 0, so that this series can progress. 

5.3.2 Applying this to the extended Euclidean algorithm 

Back to our implementation of the extended Euclidean algorithm. For rea- 
sons which will be discussed below, in our implementation we always store the 
Euclidean pair with the larger second coordinate first. Thus one (reversible) 
iteration of the algorithm is: 

(a,A),(b,B) <-» (b, B)Aa-qb,A-qB) where q = [A/ B\ = [-^—-^-\ 

b 
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This will be decomposed into the following three individually reversible steps: 

A,B <-> A-qB,B,q a,b,q <-> a - qb,b and SWAP (4) 

where by "SWAP" we mean the switching of the two Euclidean pairs. Note 
that it appears that all the bits of q must be calculated before they can be 
uncomputc. This would mean that the computation can not be decomposed 
further into smaller reversible steps. We now concentrate on the first of these 
operations, which starts with a pair A, B of positive integers where A > B. 
Actually, since \a — qb\ > \b\, the second operation a, b,q <-> a — qb,b can be 
viewed as the same operation run backwards. The fact that a and b can be 
negative (actually they have opposite sign) is only a minor complication. 

So we want to do the division A, B <-> A — qB, B, q in a way which takes less 
time for small q, namely we want to do only around log 2 q subtractions. What 
we do is essentially grade school long division in base 2. First we check how 
often we have to double B to get a number larger than A, and then we compute 
the bits of q from highest significance to lowest. In the first phase (operation 1) 
we begin with i — and do a series of operations of the form 

A,B,i <-> A,B,i + l 

As explained in section 15. '6. II we need to flip a flag bit / at the beginning and 
at the end of this series. The beginning is easily recognised as i — 0. The end 
is marked by 2 l B > A. Thus testing for the last step essentially involves doing 
a subtraction in each step. 

In the second phase (operation 2) we then diminish i by 1 in each step: 

A-q'B, B, i+1, q' *-> A — (q' + 2 i q i )B, B, i, q' + 2 i q i 

where q' = 2 J+1 g;+i + 2 l+2 qi+2 + ■ ■ ■ is the higher order bits of q. The new bit 
qi is calculated by trying to subtract 2 l B from the first register. This is easily 
done by subtracting and, if the result is negative (qi = 0), undoing the entire 
subtraction in the carry uncomputing phase. The first operation in the second 
phase is recognised by checking q' = and the last by checking i = 0. Note that 
when incrementing and decrementing i we might actually want to shift the bits 
of B to the left (resp. right), as then the gates in the subtractions can be done 
between neighbouring qubits. 

The third and fourth phases perform a, 6, q <-> a — qb,b and are essentially 
the reverses of phases two and one respectively. Thus operations three and four 
are 

a- qb+ (q' + 2^^, b, i, q' + 2 i q i <-> a - qb + q'b, b, i + 1, q' 

and 

a — qb,b,i <-> a — qb,b,i — l 

where q' and qi are as in phase two. The first and last operation conditions of 
phase three are i = and q' = 0. While the first and last operation conditions 
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of phase four are \a — qb\ < 2 I+1 |6| and i = 0. (These conditions are essentially 
the last and first operation conditions for phases two and one respectively.) 

Finally we also need to do the SWAP operation where we switch the two 
Euclidean pairs so that again their second components are in the order: larger 
to the left, smaller to the right. If we didn't do that, we would have to double 
the set of operations above, to take care of the case when the roles of the pairs 
are switched. The SWAP of course simply switches the registers qubit by qubit 
(although this has to be done conditional on the control register c). As every 
sequence of SWAP operations will be of length one, the flag bit / can be left 
untouched. 

5.3.3 How many steps are necessary? 

With the SWAP we have a sequence of five operations which we repeatedly 
apply one after the other to the quantum computer. So the question is: How 
many times do we need to cycle through the five operations? Each iteration of 
Euclid's algorithm ends with a SWAP and thus requires an integer number of 
cycles through the five operations. Also note, that in each iteration the length 
of the sequence of operations o, ; is the same for all four phases. Thus for one 
iteration the following operations might actually be applied to one computation: 

SWAP 04 04 04 03 03 03 02 02 02 o\ oi oi\x) 

The length z of the sequences (here 3) is the bit length of the quotient q in 
the iteration (i.e. z = [log 2 qJ +1). If each operation is done only once (so 
z = 1), everything will be finished with one cycle through the five operations. 
In general the number of cycles for the iteration will be 4(z — 1) + 1. 

Let r be the number of iterations in a running of Euclid's Algorithm on p, x, 
let <7i, q2, ■ ■ ■ , q r be the quotients in each iteration and let t be the total number 
of cycles required. Then 

r r 

t = 5>Llog 2 fe)J + 1) = r + 4^Llog 2 fe)J 

i=l t=l 

For p > 2 a bound on t is 4.5 log 2 (p) (see appendix [B| . Thus we can bound the 
number of cycles by 4.5n. 

5.3.4 The quantum halting problem: a little bit of garbage 

Actually, because the inverse computation for each basis state has to be re- 
versible it can't simply halt when B = 0. Otherwise, when doing things back- 
ward, we wouldn't know when to start uncomputing. This has been called the 
"quantum halting problem" , although it seems to have little in common with 
the actual classical (undecidable) halting problem. Anyway, instead of simply 
halting, a computation will have to increment a small (log 2 4.5n bit) counter 
for each cycle after Euclid's algorithm has finished. Thus once per cycle we will 
check if B = to determine if the completion counter needs to be incremented. 
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This means that at the end of Euclid's algorithm we will actually have a little 
"garbage" besides x~ x mod p. Also, at least one other bit of garbage will be 
necessary because, as mentioned earlier, half the time we get x^ 1 — p instead 
of x _1 itself. Note that in our computation of x,y «-> x,y/x we use Euclid's 
algorithm twice, once for x «-> x" 1 and once for the converse (see section l4.3.1[) . 
Thus we can simply leave the garbage from the first time around and then run 
the whole algorithm backwards. 

5.3.5 Saving space: Bounded divisions and register sharing 

To solve the DLP our algorithm will need to run the Euclidean algorithm 8n 
times. Each running of the Euclidean algorithm will require at most 1.5n iter- 
ations (see Thus the DLP algorithm will require at most 12n 2 Euclidean 
iterations. As mentioned in section T5. 2. II the probability for the quotient, q, in 
a Euclidean iteration to be clog 2 n or more, is ~ l/n c . Thus by bounding q to 
3 log 2 n-bits (instead of n bits) the total loss of fidelity will be at most 12/n. 

Over the course of Euclid's algorithm, the first number a in the Euclidean 
pair (a, A) gets larger (in absolute value), while A gets smaller. Actually the 
absolute value of their product is at most p: At any given time, we store the 
two parentheses (a, A) and (b,B). It is easy to check that \bA — aB\ remains 
constant and equals p during the algorithm (bA — aB simply changes sign from 
one iteration to the next and the initial values are (0,p) and (l,x)). Now 
p = \bA — aB\ > \bA\ > \aA\, where we used that a and b have opposite sign 
and |a| < \b\. So we see that a and A could actually share one n-bit register. 
Similarly, since \bB\ < \bA\ < p, it follows that b and B could also share an 
n-bit register. 

The problem is, that in the different "quantum parallel" computations, the 
boundary between the bits of a and those of A (or b and B) could be in different 
places. It will be shown in section 15.4.31 that the average number of cycles 
required is approximately 3.5n. Thus on average after r cycles we would expect 
A and B to have size n — r/3.5 and the size of a and b to be r/3.5. We shall 
define the "size perturbation" of a running of the extended Euclidean algorithm 
as the maximum number of bits any of A, B, a or b reach above their expected 
sizes. Table^gives some statistics on size perturbations for various values of n. 
For each value of n in the table, the size perturbations were calculated for one 
million runnings of Euclid's algorithm (1000 random inverses for each of 1000 
random primes) . From the table we see that the mean of the size perturbations 
ranges from 1.134-y/n for n — 110 to 1.069-\/n for n = 512 and over all 6 million 
calculations was never over 2^/n. By analyzing the distributions of the size 
perturbations it was seen that for n € [110,512] the distributions are close to 
normal with the given means and standard deviations. 

Thus one method of register sharing between a, A, b and B would be to take 
the size of the registers to be their expected values plus 2 v / n. In this case 
the four registers could be stored in 2n + 8y/n qubits (instead of 4n qubits). 
Note that a, A, b and B are never larger than p, thus when implementing the 
register sharing one would actually use the minimum of n and the expected 
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n 


Mean Size 
Perturbation 


Standard 
Deviation 


Maximum Size 
Perturbation 


110 


11.90 


1.589 


18 


163 


14.13 


1.878 


24 


224 


16.35 


2.115 


25 


256 


17.33 


2.171 


25 


384 


21.02 


2.600 


31 


512 


24.20 


3.084 


38 



Table 1: Size perturbations during Euclid's Algorithm 



value plus 2y / n. As the amount of extra qubits added to the expected sizes of 
the buffers was only found experimentally, we shall carry through the analysis 
of the algorithm both with and without register sharing. 

5.4 Analysis of the Euclidean algorithm implementation 

The most basic operations, namely additions and subtractions, are in the end 
conditioned on several qubits, which seems to complicate things a lot. But before 
e.g. doing an addition we can simply compute the AND of these control qubits, 
put it into an auxiliary qubit, and use this single qubit to control the addition. 
Thus the basic operations will essentially be (singly) controlled additions and 
subtractions, as for the factoring algorithm. Detailed networks for this can e.g. 
be found in 

5.4.1 Running time: 0(n 2 ) 

Let us now analyze the running time of our implementation of the extended 
Euclidean algorithm. The algorithm consists of 4.5n operation cycles. During 
each of these cycles the halting register needs to be handled and each of the five 
operations needs to be applied. 

Handling the halting register requires checking if B = and incrementing 
the log 2 4.5n bit register accordingly. The following table summarises the oper- 
ations required in the first four operations of a cycle. 





Main Operation 


First Check 


Last Check 


1 


z-bit ADD 


z-bit ZERO 


to-bit SUB 


2 


iu-bit SUB, z-bit SUB 


(3 log 2 ri)-bit ZERO 


z- bit ZERO 


3 


to-bit ADD, z-bit ADD 


z- bit ZERO 


(3 log 2 n)-bit ZERO 


4 


z-bit SUB 


to-bit SUB 


z-bit ZERO 



Where z = log 2 (31og 2 n) is the size of the register for i, w represents the bit 
size of the registers for a, A, b and B (w < n and depends on whether or not 
register sharing is being used), ZERO means a compare to zero and the tu-bit 
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operations are applied to two quantum registers. Lastly, the fifth operation of 
the cycle, SWAP, swaps two registers of size at most 2n. 

Therefore each of the 4.5n cycles requires 4 w-bit additions/subtractions, a 
SWAP and a tu-bit compare to zero, all of which are 0(n). The running time 
of the w-h\t operations dominate the algorithm and lead to a running time of 
0(n 2 ). 

5.4.2 Space: 0(n) 

Let us now determine the number of qubits required for our implementation of 
the extended Euclidean algorithm. The largest storage requirement is for the 
two Euclidean pairs (a, A) and (b,B), which as discussed in section 13.3.51 can 
be either 4n or 2n + bits depending on whether or not register sharing is 
used. The next largest requirement is the n bits needed for the carry register 
during the additions and subtractions. The quotient q will require (3 log 2 n) 
bits fsee I5.3.5)) . The halting counter, h, will be of size log 2 4.5n, however since 
h and q are never required at the same time they can share a register. The i 
register needs to be able to hold the bit size of the maximum allowed quotient 
(3 log 2 n) and thus easily fits in a log 2 n register. Lastly the algorithm requires 
a small fixed number (< 10) of bits for the flag /, the control register c and 
any other control bits. Thus we see that the algorithm requires approximately 
5n + 4 log 2 n + e or 3n + 8y/n + 4 log 2 n + e bits depending of whether register 
sharing is used. In either case we see that the space requirement is 0(n). 

5.4.3 Possible improvements and alternative approaches 

Here we list a few possible improvements and alternatives to our approach. It 
might also be that there are standard techniques, which we are not aware of, 
for finding (short) acyclic reversible circuits. 

Reducing the number of cycles 

While 4.5?i is a limit on the maximum number of cycles required in the Euclidean 
algorithm, there are very few inputs for which the algorithm actually approaches 
this bound. For a prime p let L q (p) be the number of times q occurs as a quotient 
when the Euclidean algorithm is run on p, x for all x satisfying 1 < x < p. In 
P5) it was shown that 

L q {p) = ifc^ ln ( (g +t)^-l ) HP) + ° {P{1 + l,pf) 

Using this fact, it can be shown that the total number of cycles required for 
finding x^ 1 for all 1 < x < p is 

p-i 

]T^(p)(4Llog 2 (g)J + 1) « (p- 1)3.5 log 2 (p) 

9=1 
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Thus the average number of cycles is approximately 3.5n. Experiments con- 
ducted seem to show that the distribution of required cycles is close to normal 
with a standard deviation of around *Jn. Thus if we run the quantum computer 
a few standard deviations beyond the average number of cycles, nearly all com- 
putations will have halted making the loss of fidelity minimal. While this still 
leads to a 0(n 3 ) DLP algorithm, the constant involved will have decreased. 

Reducing the number of carry qubits 

Actually the number of carry qubits can be reduced by "chopping" e.g. an 
n-qubit addition into several pieces and have the carry qubits not go much 
beyond one piece at a time. (Thereby we sacrifice some fidelity, see e.g. [5].) 
This procedure takes somewhat more time than a standard addition, but it may 
well make sense to reduce the number of carry qubits (currently n) by a factor 
of 2 or 3. 

Store length of numbers separately 

Here the idea is to also store the bit lengths of the numbers in (a, A) and (6, B). 
In the divisions A/B etc. the size of q could be determined by one comparison. 
Also the register sharing might be easier, allowing for fewer than the current 
extra qubits. Another possibility might be to synchronise the quantum 
parallel computations by the lengths of the numbers. Then we would e.g. even 
classically know the size of A. 

More classical pre-computation for GF(p) 

As mentioned earlier, we can assume that classical computation is much cheaper 
than quantum computation. Thus it might be reasonable to classically pre- 
compute and store many values specific to GF(p), if these values would help to 
make the quantum implementation of Euclid's algorithm easier. Unfortunately 
we haven't found any way of doing this. 

A quantum solution to arithmetic in GF(p) 

With an (approximate) quantum Fourier transform of size p, addition modulo 
p in a way becomes simpler |24| . It would be nice to find such a "quantum 
solution" to both, addition and multiplication. But to us it seems unlikely that 
this is possible. 

Binary extended Euclidean algorithm 

This is a variant of Euclid's algorithm (see e.g. |21| . Vol. 2, p. 338) which only 
uses additions, subtractions and bit shifts (divisions by 2). Basically one can 
subtract (6, B) from (a, A), but one also divides a parenthesis by 2 till the second 
component is odd. We haven't managed to see that this algorithm is piecewise 
reversible. Still, even if it isn't, it may be enough to keep relatively little garbage 
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around to make it reversible (Our implementation of this algorithm used 7n + e 
qubits and had a running time of 0(n 2 )). 

6 Results and a comparison with factoring 
6.1 Total time for the DLP algorithm 

Let's collect the total number of quantum gates for the whole discrete logarithm 
algorithm. Remember that the success probability of the algorithm is close to 
1 (appendix 0), thus we assume that we have to do only a single run. We will 
not actually go to the lowest level and count the number of gates, but rather 
the number of (n-bit) additions. 

In table[21 we decompose each part of the DLP algorithm into its subroutines, 
plus things that can be done directly (to the right). At the top are the 2n group 
shifts by a fixed (classical) elliptic curve point (section |4.1|> . n for x ■ P and 
n for y ■ Q. Each group shift is decomposed into 2 divisions (section |4.3.1[) . a 
multiplication to square A, and a few modular additions. Multiplications and 
additions here are understood to be modulo p (section 14.3.20 . 

2n group shifts (e.g. \A) ->\A + 2 l ■ P)) 
I each 

2 div i sions + 1 multiplication (for squaring A) +5 additions 

each 3n additions 

I each 

2 Euclid's + 2 multiplications 



I each 
4.5n cycles 



each 3n additions 



I each 

5 operations + halting counter 
I each 

1 (short) addition + flag + counter operations 

V ' v v ' 

average w n/2 + 2sjn bits on < 3 log 2 n bit registers 



Table 2: DLP Algorithm Operations 

As discussed in section lS.4.11 a running of Euclid's algorithm requires 4.5n 
cycles through the five operations. If w represents the sizes of the a, A, b and 
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B registers then each of these cycles requires a iu-bit compare to zero (for the 
halting register), 4 iu-bit additions/subtractions, a register swap and some op- 
erations on 31og 2 n and log 2 (31og 2 n) bit registers. For our analysis we shall 
assume that all these operations together are equivalent to 5 w-bit additions 
(This is reasonable since the SWAP and compare to zero operations are quite 
easy compared to additions, see e.g. 0] fig. 10). After the first running of 
Euclid's algorithm we have found the inverse, but still have some garbage in 
the halting register. We saw in section l5'.3.4l that the second running of Euclid's 
algorithm will actually be the reverse of the above operations. Thus the run- 
ning time of the two instances of Euclid's algorithm will be double the above 
operations. 

To get a nice comparison to the factoring algorithm we need to know how 
many classical-quantum additions are required (since the factoring algorithm 
uses additions in which one summand is known classically 5 ). In order to 
do this we assume that a quantum-quantum addition is a factor of 1.7 times 
more difficult than a classical-quantum addition (we estimated this ratio from 
the networks in t 5 ). When register sharing is used, the sizes of the a,A,b and 
B registers change linearly between 2y/n and n + 2y / n. This implies that on 
average w = n/2 + 2^/n. This gives a total running time of 

T = 2n [5 + 3n + 2[6n + 2(4.5(5™))]] ■ 1.7 « 360n 2 

n-bit additions with no register sharing and 

T = 2n[5 + 3n + 2[6n + 2(4.5 • 5{n/2 + 2Vn))]] ■ 1.7 « 205n 2 + 615n 3/2 

n-bit additions with register sharing. 

As a classical-quantum addition is 0(n) this implies that the discrete loga- 
rithm algorithm is 0(n 3 ). Assume a running time of k ■ n for an n-bit classical- 
quantum addition. Then the discrete logarithm algorithm has a running time 
of approximately 360fcn 3 compared to only about Akr? for factoring, but the 
larger n needed for classical intractability more than compensates for this (see 
section 

6.2 Total number of qubits (roughly Qn) 

For the number of qubits necessary for the discrete logarithm algorithm, what 
counts is the operations during which the most qubits are needed. Clearly this 
is during the extended Euclidean algorithm, and not e.g. in the course of a 
modular multiplication. 

In fact, the maximum qubit requirement will occur in the second call to the 
Euclidean algorithm within each division (see section 14.3.11) . Here we require 
two n-bit registers plus a register on which to carry out the Euclidean algorithm 
(see table |3J). Thus the DLP algorithm requires either /(n) = In + 41og 2 n + e 
or f'(n) = 5n + 8 v / n + 41og 2 n + e bits depending of whether register sharing is 
used (see section 15. 4 .2|l . Therefore the DLP algorithm, like the extended Euclid 
algorithm, uses space O(n). 
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division 

2n + Euclid 

(a, A) + (b, B) + carry qubits + q + i + minor stuff 
2n + 8i/ri n qubits 4 log 2 n qubits < 10 qubits 

Table 3: Maximum Bit Requirement With Register Sharing 



6.3 Comparison with the quantum factoring algorithm 

One of the main points of this paper is that the computational "quantum advan- 
tage" is larger for elliptic curve discrete logarithms than for the better known 
integer factoring problem. With our proposed implementation we have in par- 
ticular achieved similar space and time requirements. Namely the number of 
qubits needed is also of 0{n) and the number of gates (time) of order 0(n 3 ), 
although in both cases the coefficient is larger. Note that the input size n is 
also the key size for RSA resp. ECC public key cryptography. Because the best 
known classical algorithms for breaking ECC scale worse with n than those for 
breaking RSA, ECC keys with the same computational security level are shorter. 
Below is a table with such key sizes of comparable security (see e.g. |25|). The 
column to the right roughly indicated the classical computing resources neces- 
sary in multiples of C, where C is what's barely possible today (see. e.g. the 
RSA challenges [22] or the Certicom challenges [17]). Breaking the keys of the 
last line seems to be beyond any conceivable classical computation, at least if 
the presently used algorithms can't be improved. 



Factoring algorithm (RSA) 


EC discrete logarithm (ECC) 


classical 


n 


~ # qubits 


time 


n 


~ # qubits 


time 


time 




2n 


4n a 




/'(") (/(*)) 


360n 3 




512 


1024 


0.54 ■ 10 y 


110 


700 (800) 


0.5 ■ 10 9 


C 


1024 


2048 


4.3 ■ 10 9 


163 


1000 (1200) 


1.6 • 10 9 


C- 10 s 


2048 


4096 


34 • 10 <J 


224 


1300 (1600) 


4.0 • 10 9 


c ■ w 


3072 


6144 


120 • 10 9 


256 


1500 (1800) 


6.0 • 10 9 


C ■ 10 22 


15360 


30720 


1.5 ■ 10 la 


512 


2800 (3600) 


50 • 10 9 


C ■ 10 BU 



Where f(n) and f'(n) are as in section lrT2"l with e = 10. The time for the 
quantum algorithms is listed in units of "1-qubit additions", thus the number 
of quantum gates in an addition network per length of the registers involved. 
This number is about 9 quantum gates, 3 of which are the (harder to implement) 
Toffoli gates (see e.g. [5])- Also it seems very probable that for large scale quan- 
tum computation error correction or full fault tolerant quantum computation 
techniques are necessary. Then each of our "logical" qubits has to be encoded 
into several physical qubits (possibly dozens) and the "logical" quantum gates 
will consist of many physical ones. Of course this is true for both quantum 
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algorithms and so shouldn't affect the above comparison. The same is true for 
residual noise (on the logical qubits) which will decrease the success probability 
of the algorithms. The quantum factoring algorithm may have one advantage, 
namely that it seems to be easier to parallelise. 
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A Appendix: Detailed analysis of the success 
probability 

Here we analyse in some detail the success probability of the discrete logarithm 
quantum algorithm when we use the usual quantum Fourier transform of size 
N = 2™, as opposed to the ideal case which would have prime size. The result 
is, that the algorithm has a probability close to 1 of giving the right answer. 
Thus when looking at the runtime we will assume that a single run is enough. 

A.l Order finding algorithm (basis of factoring) 

We first consider the case of the order finding algorithm fsection l2.2.1[) which 
is the basis of the factoring algorithm. The discrete logarithm case is then sim- 
ply a 2 dimensional version of this. Here we will use the eigenvalue estimation 
viewpoint introduced by Kitaev JT] (see also J7j). The advantage of this view- 
point is, that the (mixed) state of the register which we ultimately measure is 
explicitly written as a mixture of isolated "peaks" (thanks to Mike Mosca for 
pointing this out). In the usual picture, which we used in section 12. 2. II we have 
the diagonalised form of the mixed state (or, equivalcntly, we use the Schmidt 
normal form between the entangled registers) . But there we have to worry about 
destructive interference between different peaks, which makes the analysis a bit 
less nice. 

So we want to find the order r of a group element a. Again we do: 

v X v X v X 

Where e is the neutral element and U a is multiplication by a, thus U a \g) = \ag). 
(Eigenvalue estimation refers to the eigenvalues of U a .) Now we write |e) in 
terms of eigenstates of U a . These r eigenstates are easy to find: 

= -= V uj r kk \a k ) with U a \V k ) = 0J r ~ k \^k) and uj r = e 2m/r 
Jr 

v fc'=o 
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It is also easy to see that |e) is simply a uniform superposition of these states: 



le. 

It 

k 



So the state of the quantum computer can be written as 



SD v X v fc 

Now we apply the QFFTjy to the first register to obtain: 

4e(e^e^v-*i, 

* fe \ a;' 2; 

Because the are orthogonal, the state of the first register alone can be 

viewed as a mixture of r pure states, one for each k. The probabilities associated 
with each of these pure states are equal, namely 1/r, as can be seen from the 
previous equation. By summing the geometrical series in the sum over x we get 
for these (normalised) pure states: 




^ J. e ^iN(x' /N-k/r) _ J 

i^E^'^V) = E jy ^(sviv-fc/r) _ ! = 

x' x x' 

= i<t>(x') sin(7r(x / - fciV/r)) a = y> ^(^(^ ~ 4)) 1 a 

^ Nwx(n(x'/N- k/r)y ' ^ Nsm(n(x' - x' )/N) 1 ; 

Where is some (irrelevant) phase. We see that each of these states is 

dominated by basis states \x') with 

x' fn k ■ N/r = x' 

Thus each of the pure states corresponds to one "smeared out" peak centered 
at x' . Note that the argument of the sine in the denominator is small. So the 
shape of the peak is approximately given by the function sin(7ra;) / (ttx) sampled 
at values for x which are integers plus some constant fractional offset, as plotted 
in figure ^ 

We are interested in the probability of observing a basis state no farther 
away from the center x' of the peak than, say Ax' . How spread out the peak is, 
depends on the fractional offset. If there is no offset, then we simply observe the 
central value with probability 1. The largest spread occurs for offset 1/2. (Then 
the probabilities of the two closest basis states are each 4/7r 2 .) The chance of 
obtaining a state at distance Ax' decreases as l/(Ax') 2 . So the probability of 
being away more than Ax' on either side is at most about 2/ Ax. Because the 
total probability is normalised to 1, this tells us what the chance is of coming 
within Ax' of the central value. 



2N 



Figure 1: The function am ^. x ■ Up to an (irrelevant) phase, the amplitudes near 
a "peak" are given by sampling this function at integer intervals, as indicated 
by the circles. 

A. 2 Discrete logarithm case 

The discrete logarithm case is analogous, actually it can be viewed as a two 
dimensional version of the order finding algorithm. We have 

N-i _ 1 q-l 

x,y=0 x.y x,y V y k=0 

By applying a Fourier transform of size N to each of the first two registers we 
get 

-4E ( E ^E**"V ta ^E-^V^ i^'>) i**> 

Again we get a peak for each k, and each with the same probability. The x' and 
y' values are independently distributed, each as in the above 1-dimensional case. 
For x' the "central value" is Nk/q and for y' it is Ndk/q. To obtain the values 
k and dk which we want, we multiply the observed x' , y' with q/N and round. 
Thus, if we chose N (= 2") sufficiently larger than q, we are virtually guaranteed 
to obtain the correct values, even if x' and y' are a bit off. Alternatively, we 
can try out various integer values in the vicinity of our candidate k and dk, 
thus investing more classical post-processing to make the success probability 
approach 1. 



29 



B Appendix: Bounding the number of cycles 



It was shown in section [5.3. 31 that the number of cycles required to complete the 
Euclidean algorithm on inputs p and x is 

r 

t(p,x) =4^L 1 °g2(%)J 

i=l 

where Qi,Q2, ■ • • ,q r are the quotients in the Euclidean algorithm. 

Lemma 1 If p and x are coprime integers such that p > x > 1 and p > 2 then 
t(p,x) < 4.51oga(p). 

Proof: Assume by way of contradiction that there exist integers (p, x) for 
which the lemma does not hold. Let (j>, x) be an input for which the number 
of Euclidean iterations, r, is minimal subject to the condition that the lemma 
does not hold (i.e. t(p,x) > 4.51og 2 (p), p > 2, p > x > 1 and gcd(p, x) = 1). 
Let q\ , . . . , q r be the quotients when the Euclidean algorithm is run on (p, x) . 

We will now obtain a contradiction as follows. First, we show that if t(p, x) > 
4.51og 2 (p) then the Euclidean algorithm with input (p,x) will require at least 
three iterations (i.e. r > 3). Next, we show that if t(p,x) > 4.51og 2 (p) and 
the Euclidean algorithm run for two iterations on input (p, x) returns the pair 
(y, z) then (y, z) also contradict the lemma. Since (y, z) would contradict the 
lemma with fewer iterations than (p, x) this contradicts the existence on (p,x). 
It is easily verified that the lemma holds provided 2 < p < 15 (simply calculate 
t(p,x) for each of the possibilities). We can thus assume that p > 16. 

Recall that the Euclidean algorithm takes as input two integers (a, b) and 
terminates when one of a and b is set to zero, at which point the other integer 
will be gcd(a, b). An iteration of the Euclidean algorithm on (a, b), with a > b, 
returns (a — qb, b), where q = [a/b\ . Note that since gcd(p, x) = 1 on this input 
the algorithm will terminate with either (1, 0) or (0, 1). 

Let us first prove that the Euclidean algorithm with input (p, x) will require 
at least three iterations. Since neither p nor x is zero we know that r > 1. 
Suppose that r = 1. Then the single iteration of the algorithm transforms (p, x) 
to (p — q\x, x) — (0, 1). This implies that x = 1 and q\ = p. Thus 

t(p,x) = 4[log 2 (p)J + 1 < 4.51og 2 (p) (since p > 2) 

which implies that r > 2. Suppose that r = 2. Then the two iterations of the 
algorithm would transform 

(p,x) -> (p-qix,x) -> (p - qtx,x - q 2 (p - q%x)) = (1,0) 
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This implies that p — q\X = 1 and q 2 = x. Thus p — q\q 2 = 1, which implies 
that log 2 (p) > log 2 ((?i) + log 2 (<? 2 ). Therefore 

t(p,x) = 4Llog 2 ( <Zl )J +4Llog 2 ( <Z2 )J +2 

< 4Llog 2 (< Zl )+log 2 (g 2 )J +2 

< 4Llog 2 (p)j+2 

< 4.51og 2 (p) (since p > 16) 

and we have that r > 3. Note that we now know x ^ 1,2, p — q\x ^ 1 and 
x ~ <Z2(p — Q\x) 7^ since any of these would imply r < 2. 

We shall now establish that q\ e {1,2}. After the first iteration of the 
Euclidean algorithm the problem is reduced to running the algorithm on (p — 
q\X, x), for which the quotients will be q 2 , . . . , q r . Since xq x < p we have that 
log 2 (p) > log 2 (a;) +log 2 ((7i). Therefore 

r 

t(x,p-q!x) = 4^2[\og 2 ( qi )] +r - 1 

= t(p,a:)-(4Llog 2 (gi)J+l) 

> 4.51og 2 (p)-(4Llog 2 ( gi )J +1) 

> 4.51og 2 (x) +4.5Iog 2 («i) -4LIog 2 («i)J - 1 

> 4.51og 2 (x) (if ft > 3) 

Thus if qi > 3 then t(x,p — q\x) > 4.5 log 2 (x), x > 2 and x > p — q\x > 1, but 
this would contradict the minimality of r. Therefore qi <E {1, 2}. 

After two iterations of the Euclidean algorithm on (p, x) the problem has 
been reduced to running the algorithm on (p — q\x, x — q 2 (p — qix)). We will 
now show that the lemma does not hold for (p — q\x, x — q 2 (p — q\x)). This will 
contradict the minimality of r and thus the existence of (p, x) . To do this, we 
must first show that p — q\x > 2 and that p — qix > x — q 2 (p — qxx) > 1 (so that 
the lemma applies). As discussed above, since r > 3 we know that p — q\x > 1 
and that p—q\x > x — <? 2 (p— q\x) > 1, thus we need only show that p—q\x ^ 2. 

Suppose that p — q\x = 2. Since q\ e {1, 2} either p = x + 2 or p = 2.x + 2. 
Since gcd(p, x) = 1 this implies that x is odd and that the Euclidean algorithm 
will proceed as follows 

(p,a;)->(2,a;)->(2,l)-(0,l) 

Thus r = 3, q 2 = (x — l)/2, q 3 — 2 and 

t(p, x) = 4Llog 2 ( 9l )J + 4Llog 2 ((a; - 1)/2)J + 4Llog 2 (2)j + 3 
= 4Llog 2 ( 9l a;-gi))J +3 
< 4.51og 2 (p = qix + 2) 

where the last line follows by checking the values for qi e {1, 2} and x < 64 and 
noting that 4.5 log 2 (qix) > 41og 2 (qix) + 3 when x > 64. This would contradict 
the fact that the lemma doesn't hold for (p, x), thus p — q\x ^ 2. 
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Now to complete the proof we need only show that t(p—q\x, x—q 2 {p—qix)) > 
4.51og 2 (p — qix). Let x — cp, so p — q\x = (1 — q\c)p with 1 > 1 — q\c > 0. By 
the Euclidean algorithm we know that x > q 2 (p — qi%) and thus 

log 2 (a;) = log 2 (p) + log 2 (c) > log 2 (p) + log 2 (l - gic) + log 2 (g' 2 ) 

Therefore log 2 (c/(l — qic)) > log 2 (g 2 ), which implies 1 — q\c < 1/(1 + qiq 2 ). 
This in turn implies that log 2 (p — q\x) = log 2 (p) + log 2 (l — q±c) < log 2 (p) — 
log 2 (l + qiq 2 ) . Hence 

r 

t(p- q\x,x- q 2 (p- qix)) = 4^L lo S2(*)J + r ~ 2 

z=3 

= t(p, a:) - (4Llog 2 (g 2 )J + 4Llog 2 ( gi )J + 2) 

> 4.51og 2 (p) - (4Llog 2 (g 2 )J +4Llog 2 ( 9l )J +2) 

> 4.51og 2 (p- qxx) + Z(q 1 ,q 2 ) 



where Z(q u q 2 ) = 4.5 log 2 (l + q x q 2 ) - (4Llog 2 (g 2 )J +4Llog 2 ( qi )J +2). 

If q x = 1 then Z(gi,g 2 ) = 4.51og 2 (l + g 2 ) - (4|log 2 (g 2 )J + 2). It is easy to 
check that Z(l, q 2 ) is non-negative when q 2 € {1, . . . , 14} and if g 2 > 15 then 
Z(l,q 2 ) > .51og 2 (l + q 2 ) - 2 > 0. Therefore Z(q u q 2 ) > when gi = 1. 

If q x = 2 then Z(<?i,g 2 ) = 4.51og 2 (l + 2g 2 ) - (4Llog 2 (g 2 )J + 6). It is easy to 
check that Z{2, q 2 ) is non-negative when q 2 € {1, . . . , 7} and if q 2 > 8 then 

Z(2,«ft) = 4.51og 2 (l + 2g 2 )-(4Llog 2 (< Z2 )J +6) 

> 4.5(log 2 (g 2 ) + 1) - (4Llog 2 (g 2 )J + 6) 

> .51og 2 (g 2 ) - 1.5 

> 



Therefore Z(qi, q 2 ) > when q% = 2. 

Thus Z(qi,q 2 ) > and we have that t{p — q\x , x — q 2 {p — q\x)) > 4.5 log 2 (p — 
q\x). This contradict the minimality of r and thus the existence of (p, x). There- 
fore the lemma holds. □ 



Note that i(4, 1) = 9 = 4.51og 2 (p) and thus the bound is tight. It is also 
worth noting that t(2, 1) = 5 = 51og 2 (p) which is why the requirement p > 2 
was included in the lemma. 
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